here::here(), exporting figures, dplyr::case_when(), purrr::modify_if()here package for better file pathsggplot2::ggsave()purrrdplyr::case_when()Source: https://www.ndbc.noaa.gov/station_history.php?station=ntbc1 Data from: National Buoy Data Center (NOAA), accessed 10/4/2019 Data descriptions: https://www.ndbc.noaa.gov/measdes.shtml
NOTE: need to have {hexbin} installed previously, even though not attached here, for some to use geom_hex()
library(tidyverse)
library(here)
library(janitor)
library(ggridges)
Use here::here() to let R know where to find the data! And use readr::read_table() to “read whitespace-separated columns into a tibble.”
sb_buoy <- readr::read_table(here::here("lab_3_materials",
"raw_data",
"sb_buoy_2018.txt"),
na = c("99", "999", "99.0", "999.0")) %>% # Note: students will just have "raw_data" here
janitor::clean_names() %>%
dplyr::slice(-1) %>% # Use dplyr::slice() to remove or keep rows by position
select(number_yy:gst, atmp)
here::here()Now if I want to write an intermediate file (to keep this cleaned up version…)
write_csv(sb_buoy, here("lab_3_materials", "intermediate_data", "sb_buoy.csv"))
First, check it out:
class(sb_buoy$mm) # It's a character
## [1] "character"
unique(sb_buoy$mm) # All the values here
## [1] "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12"
# Check out month.abb() built-in vector
month.abb # Cool - pre-stored month abbreviates
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
## [12] "Dec"
sb_buoy_month <- sb_buoy %>%
mutate(mm = as.numeric(mm)) %>% # Convert month to numeric
mutate(
month_name = month.abb[mm] # Note square brackets here
)
# Check the class of month_name
class(sb_buoy_month$month_name)
## [1] "character"
# But that means if we plot it, R will just do it alphabetically.
# We want to be able to assign these an *order*
# So we need to make it a factor, and specify the factor levels (Jan, Feb, etc.)
# Use mutate() to overwrite the column (carefully...)
sb_buoy_fct <- sb_buoy_month %>%
mutate(
month_name = fct_relevel(month_name, levels = month.abb)
)
### could also arrange(as.numeric(mm)) %>% mutate(month_name = fct_inorder(month_name))
### -- this method would work for (e.g.) the country salmon production
### data where the order is not known in advance.
levels(sb_buoy_fct$month_name)
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
## [12] "Dec"
# Now R will understand that order matters, & will plot accordingly
purrr functions for loops, and some datavizFirst, let’s explore air temperatures (atmp) by month:
ggplot(sb_buoy_fct, aes(x = month_name, y = atmp)) +
geom_jitter() # Weird. Why?
# Always check classes of things: here, notice that all are read in as character.
# Not great. Let's convert multiple columns to "numeric" at once using purrr::map()
# purrr::map
YUCK that’s wrong.
Check the class of the variables with summary().
We see that these “values” are stored as characters, when the need to be values. One option is to use as.numeric() for each column separately. Or we can use purrr::modify_if() to apply a function to multiple variables that meet a condition by looping over them.
sb_buoy_num <- sb_buoy_fct %>%
purrr::modify_if(is.character, as.numeric)
### Casey question: what if we had left month_name as character?
Now let’s try looking at the air temperatures again:
ggplot(sb_buoy_num, aes(x = month_name, y = atmp)) +
geom_jitter() # Too many points, but trends clear
ggplot(sb_buoy_num, aes(x = month_name, y = atmp)) +
geom_violin() # A little hard to compare still, but good
ggplot(sb_buoy_num, aes(x = atmp)) +
geom_density(aes(fill = month_name),
color = NA,
show.legend = FALSE) + # Cool but useless
facet_wrap(~month_name) +
theme_light()
Hmmm but that’s hard to see because of order, even if we facet_wrap(). Here are a couple of better options:
NULL.ggplot(sb_buoy_num, aes(x = atmp)) +
geom_histogram(data = transform(sb_buoy_num, month_name = NULL), fill = "gray90") +
geom_histogram(aes(fill = month_name),
color = NA,
show.legend = FALSE) + # Cool but useless
facet_wrap(~month_name) +
theme_light()
ggsave(here("lab_3_materials", "figures", "temp_hist.png"), height = 6, width = 6)
ggridgestemp_graph <- ggplot(sb_buoy_num, aes(x = atmp, y = month_name)) +
geom_density_ridges(fill = "gray60",
color = "gray10",
size = 0.2) +
scale_x_continuous(lim = c(5,25)) +
theme_minimal() +
labs(x = "Air temperature (Celsius)",
y = "Month (2018)",
title = "SB buoy monthly temperatures (2018)",
subtitle = "Source: NOAA Nationa Buoy Data Center") +
scale_y_discrete(limits = rev(levels(sb_buoy_num$month_name))) # rev months
temp_graph
What if I wanted to save this?
ggsave(here("lab_3_materials", "figures", "temp_graph.png"), height = 6, width = 6)
# Note: you can update size, resolution, etc. within ggsave
Let’s explore a relationship:
Windspeed vs. wind direction?
# A few different graph types:
# Plain scatterplot
ggplot(sb_buoy_num, aes(x = wdir, y = wspd)) +
geom_point(aes(color = wspd))
# 2d density plot
ggplot(sb_buoy_num, aes(x = wdir, y = wspd)) +
geom_density_2d()
# Hex density plot
ggplot(sb_buoy_num, aes(x = wdir, y = wspd)) +
geom_hex(bins = 50) +
scale_fill_gradient(low = "orange", high = "red")
### Casey: this doesn't work for me, nor does the next... thoughts?
# Or, use scale_fill_gradientn(colors = c("white","yellow","orange","purple"))
# Finalize hex density and wrap by month
ggplot(sb_buoy_num, aes(x = wdir, y = wspd)) +
geom_hex(bins = 30) +
scale_fill_gradientn(colors = c("yellow","orange","purple")) +
scale_y_continuous(lim = c(0,10), expand = c(0,0)) +
scale_x_continuous(lim = c(0,360), expand = c(0,0)) +
theme_dark() +
facet_wrap(~month_name) # See how it changes by month?
It does look like there is some pattern to the windspeeds (like, we rarely have strong winds coming from ~180 degrees, but a lot of strong winds coming from ~240 degrees). Might make sense here to plot on a polar coordinate system:
ggplot(sb_buoy_num, aes(x = wdir, y = wspd)) +
geom_density_2d(aes(color = month_name),
size = 0.2,
show.legend = FALSE) +
coord_polar() +
scale_x_continuous(breaks = c(0, 90, 180, 270), labels = c("N","E","S","W")) +
facet_wrap(~month_name) +
theme_minimal() +
labs(x = "", y = "windspeed (mph)", title = "SB windspeed and direction (2018)")
ggsave(here("lab_3_materials", "figures", "wdir_polar.jpg"))
dplyr::case_when() for easier if-else statementsLet’s say that we actually want to just explore things on a seasonal level, where:
Here, we’ll make a new column that contains the season associated with each observation using dplyr::mutate() + dplyr::case_when()
sb_buoy_season <- sb_buoy_num %>%
dplyr::mutate(
season = dplyr::case_when(
month_name %in% c("Mar", "Apr", "May") ~ "spring",
month_name %in% c("Jun", "Jul", "Aug") ~ "summer",
month_name %in% c("Sep", "Oct", "Nov") ~ "autumn",
month_name %in% c("Dec", "Jan", "Feb") ~ "winter"
)
)
Then we could make calculations by season, for example:
What is the mean windspeed by season for 2018?
mean_wind <- sb_buoy_season %>%
group_by(season) %>%
summarize(
mean_wspd = mean(wspd)
)
mean_wind
## # A tibble: 4 x 2
## season mean_wspd
## <chr> <dbl>
## 1 autumn 2.49
## 2 spring 2.57
## 3 summer 2.73
## 4 winter 2.15